5 research outputs found

    Large Scale Distributed Testing for Fault Classification and Isolation

    Get PDF
    Developing confidence in the quality of software is an increasingly difficult problem. As the complexity and integration of software systems increases, the tools and techniques used to perform quality assurance (QA) tasks must evolve with them. To date, several quality assurance tools have been developed to help ensure of quality in modern software, but there are still several limitations to be overcome. Among the challenges faced by current QA tools are (1) increased use of distributed software solutions, (2) limited test resources and constrained time schedules and (3) difficult to replicate and possibly rarely occurring failures. While existing distributed continuous quality assurance (DCQA) tools and techniques, including our own Skoll project, begin to address these issues, new and novel approaches are needed to address these challenges. This dissertation explores three strategies to do this. First, I present an improved version of our Skoll distributed quality assurance system. Skoll provides a platform for executing sophisticated, long-running QA processes across a large number of distributed, heterogeneous computing nodes. This dissertation details changes to Skoll resulting in a more robust, configurable, and user-friendly implementation for both the client and server components. Additionally, this dissertation details infrastructure development done to support the evaluation of DCQA processes using Skoll -- specifically the design and deployment of a dedicated 120-node computing cluster for evaluating DCQA practices. The techniques and case studies presented in the latter parts of this work leveraged the improvements to Skoll as their testbed. Second, I present techniques for automatically classifying test execution outcomes based on an adaptive-sampling classification technique along with a case study on the Java Architecture for Bytecode Analysis (JABA) system. One common need for these techniques is the ability to distinguish test execution outcomes (e.g., to collect only data corresponding to some behavior or to determine how often and under which conditions a specific behavior occurs). Most current approaches, however, do not perform any kind of classification of remote executions and either focus on easily observable behaviors (e.g., crashes) or assume that outcomes' classifications are externally provided (e.g., by the users). In this work, I present an empirical study on JABA where we automatically classified execution data into passing and failing behaviors using adaptive association trees. Finally, I present a long-term case study of the highly-configurable MySQL open-source project. Exhaustive testing of real-world software systems can involve configuration spaces that are too large to test exhaustively, but that nonetheless contain subtle interactions that lead to failure-inducing system faults. In the literature covering arrays, in combination with classification techniques, have been used to effectively sample these large configuration spaces and to detect problematic configuration dependencies. Applying this approach in practice, however, is tricky because testing time and resource availability are unpredictable. Therefore we developed and evaluated an alternative approach that incrementally builds covering array schedules. This approach begins at a low strength, and then iteratively increases strength as resources allow reusing previous test results to avoid duplicated effort. The results are test schedules that allow for successful classification with fewer test executions and that require less test-subject specific information to develop

    A Tool for Statistical Detection of Faults in Internet Protocol Networks

    Get PDF
    While the number and variety of hazards to computer security have increased at an alarming rate, the proliferation of tools to combat this threat has not grown proportionally. Similarly, most tools currently rely on human intervention to recognize and diagnose new threats. We propose a general framework for identifying hazardous computer transactions by analyzing key metrics in network transactions. While a thorough determination of the particular traits to track would be a product of the research, we hypothesize that some or all of the following variables would yield high correlations with certain undesirable network transactions: Source Address Destination Address/Port Packet Size (overall, header, payload) Packet Rate (overall, Source, Destination, Source/Destination) Transaction Frequency (per Address) By examining statistical correlations between these variables we hope to be able to distinguish - and normalize for changes over time - a healthy network from one that is being attacked or performing an attack. Central to this research is that the class information we are analyzing is available without intervention on the participants of the network transactions, and, in reality, can be performed without their knowledge. This characteristic has the potential to allow Internet service providers or corporations the ability to identify threats without large-scale deployment of some kind of intrusion detection mechanism on each system. Furthermore combining the ability to identify existence and source of a network threat with common network hardware automatic configuration abilities allows for rapid reaction to attacks by shutting down connectivity to the originators of the exploit. This paper will detail the design of a set of tools - dubbed Culebra - capable of remotely diagnosing troubled networks. We will then simulate an attack on a network to gauge the effectiveness Culebra. Ultimately, the type of data gathered by these tools can be used to develop a database of attack patterns, which, in turn, could be used to proactively prevent assaults on networks from remote locations. UMIACS-TR-2002-7

    Moving forward with combinatorial interaction testing

    Get PDF
    Combinatorial interaction testing (CIT) is an efficient and effective method of detecting failures that are caused by the interactions of various system input parameters. In this paper, we discuss CIT, point out some of the difficulties of applying it in practice, and highlight some recent advances that have improved CIT’s applicability to modern systems. We also provide a roadmap for future research and directions; one that we hope will lead to new CIT research and to higher quality testing of industrial systems

    Trees

    No full text
    Editor: Classification methods have troubles with missing data. Even CART, which was designed to deal with missing data, performs poorly when run with over 90 % of the predictors unobserved. We use the Apriori algorithm to fit decision trees by converting the continuous predictors to categorical variables, bypassing the missing data problem by treating missing data as absent items. We demonstrate our methodology in a setting simulating a distributed, low-overhead, quality assurance system, where we have control over which predictors are missing for each observation. We also demonstrate how performance can be improved by the introduction of a simple adaptive sampling method. 1 1
    corecore